Control and Cybernetics Extraction of Polish Noun Senses from Large Corpora by Means of Clustering *
نویسندگان
چکیده
We investigate two methods of identifying noun senses, based on clustering of lemmas and of documents. We have adapted to Polish the well-known algorithm of Clustering by Committee, and tested it on very large Polish corpora. The evaluation by means of a WordNet-based synonymy test used Polish wordnet (plWordNet 1.0). Various clustering algorithms were analysed for the needs of extraction of document clusters as indicators of the senses of words which occur in them. The two approaches to wordsense identification have been compared, and conclusions drawn.
منابع مشابه
Assessment of the Performance of Clustering Algorithms in the Extraction of Similar Trajectories
In recent years, the tremendous and increasing growth of spatial trajectory data and the necessity of processing and extraction of useful information and meaningful patterns have led to the fact that many researchers have been attracted to the field of spatio-temporal trajectory clustering. The process and analysis of these trajectories have resulted in the extraction of useful information whic...
متن کاملSemantics-based Multiword Expression Extraction
This paper describes a fully unsupervised and automated method for large-scale extraction of multiword expressions (MWEs) from large corpora. The method aims at capturing the non-compositionality of MWEs; the intuition is that a noun within a MWE cannot easily be replaced by a semantically similar noun. To implement this intuition, a noun clustering is automatically extracted (using distributio...
متن کاملRelation Extraction using Matrix Factorization Methods
Relation extraction has an important role within the information extraction domain. Given an initial ontology specifying noun categories, instances from these categories and text corpora, the relation extraction task consists of extracting the relations that connect instances from these categories. Current research works related with this topic, are mostly based on the clustering methods. One w...
متن کاملOPTIMIZATION OF FUZZY CLUSTERING CRITERIA BY A HYBRID PSO AND FUZZY C-MEANS CLUSTERING ALGORITHM
This paper presents an efficient hybrid method, namely fuzzy particleswarm optimization (FPSO) and fuzzy c-means (FCM) algorithms, to solve the fuzzyclustering problem, especially for large sizes. When the problem becomes large, theFCM algorithm may result in uneven distribution of data, making it difficult to findan optimal solution in reasonable amount of time. The PSO algorithm does find ago...
متن کاملExtraction and 3D Segmentation of Tumors-Based Unsupervised Clustering Techniques in Medical Images
Introduction The diagnosis and separation of cancerous tumors in medical images require accuracy, experience, and time, and it has always posed itself as a major challenge to the radiologists and physicians. Materials and Methods We Received 290 medical images composed of 120 mammographic images, LJPEG format, scanned in gray-scale with 50 microns size, 110 MRI images including of T1-Wighted, T...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010